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Sequential Analysis with More than Two Alternative Hypotheses, and its 
Relation to Discriminant Function Analysis 

By P. Armitage 

Medical Research Council Statistical Research Unity London School of Hygiene 
and Tropical Medicine 


[Received November 11th, 1949] 


1. Introduction 

The theory of sequential tests for deciding between two alternative simple hypotheses is now 
well known, and is described by Wald (1947). In a review of Wald’s book, Barnard (1947) 
pointed out that, as a generalization of Wald’s method, one could formulate a procedure for 
deciding between more than two simple hypotheses. The procedure would be based on likelihood 
ratios, and the risks involved could be controlled by suitable choice of the acceptance conditions. 

The present note contains an outline of the theory of such sequential procedures, which are 
closely related to some recent developments in the theory of discriminant functions. The methods 
should prove useful for two-sided tests of statistical hypotheses. In particular, I consider a 
two-sided test for the value of a binomial probability, and, as a development of this, a two-sided 
test for comparative trials. 

Since the present paper was submitted for publication Sobel and Wald (1949) have published 
details of a sequential decision procedure for choosing one of three mutually exclusive and 
exhaustive composite hypotheses about the mean of a normal distribution. Their procedure 
involves the combination of two tests for distinguishing between two pairs of simple hypotheses, 
and is closely related to that suggested by Armitage (1947) for a two-sided sequential f-test. The 
discussion at the end of §4 of the present paper has been added since the publication of Sobel and 
Wald’s paper. 


2. Theory 

Consider k simple hypotheses, H u H 2 , . . ., //*, and let the likelihood from a single observa¬ 
tion, when Hi is true, be L*. There are %k(k — 1) likelihood ratios for various pairs of hypotheses, 
but each of these may be expressed in terms of k — 1 independent likelihood ratios, which may 
be chosen in any one of a number of different ways. We could take, for instance, 

Ri = Li/L/c, i — 1,2,. . . > k — 1, 

and let 

= log R t . 


(The base of the logarithms is arbitrary.) Then the logarithm of the likelihood ratio for any 
two hypotheses is either one of the y i9 or a difference between two of the y t -. 

Successive observations are taken to be independent, and the logarithm of the likelihood ratio 
for two of the hypotheses, after n observations, will be of the form 

2 yi or E O'* - y $ ) 9 

the summation being over the n observations. 
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Let us formulate the following rule of procedure. The observations are taken sequentially, 
until all the inequalities in one of the following k sets are simultaneously satisfied: 


X(yi-yi) > A it i . 


- yi-i) > Ai, i — i 


in which case Hi is accepted, 


StVi - yi+i) >A ifi+x 


(/= 1,2, . . ., k - 1) . 


( 1 ) 


or 


£ 0 >* -yk- 1) > Ai y k -1 

> A it k 

2 j 


k in which case Hk is accepted 


^ yk - 1 < — A:— 1 


( 2 ) 


the A if j being arbitrary positive constants. 

The constants Ai t j could conveniently be made equal, say A it j = A. In this case the 
inequalities specify that sampling continues until the likelihood ratios of one of the hypotheses 
against each of the others are all greater than antilog A. (This procedure was suggested by 
Barnard (1947)). For generality we retain different values of A it y 

We shall now show that, if a sequential procedure is defined by the inequalities (1) and (2), 
the probability that no decision has been reached by the « th stage tends to zero as n increases 
indefinitely. This result is true under hypotheses which are not necessarily confined to the set 
H !, . . ., Hk, provided that the distributions of the likelihood functions yi satisfy certain regularity 
conditions. 

If, at the fl th stage, no decision has been reached, then, at each stage up to and including the 
« th , at least one of the following k C 2 relations must have been satisfied: 

^k , t ^ ^ y% ^ k 0* 1 9 ••• * k 1) 


- A jt i <X(yi — yj) < A it j (i,j=l,-.*k — l) . . . (3) 

The probability that this condition is satisfied is less than the probability that at least one of the 
inequalities (3) is satisfied at the /* th stage, irrespective of the previous stages. Provided the 
variances of the distributions of y\ and (yi — yj) all exist, we can apply the central limit theorem 
to the distribution of the sums 2 yi and 2 (yi — yj). It follows that the probability that a given 
inequality of the set (3) is satisfied at the /z th stage can be made as small as we wish by a suitable 
choice of n. The probability that at least one of the inequalities (3) is satisfied at the n th stage 
therefore tends to zero as n tends to infinity, and finally the probability that the procedure has 
terminated by the « th stage tends to unity as n tends to infinity. 

Let 7 iij be the probability of accepting Hi when in fact Hj is true. By considering the total 
probability of all samples which result in Hi being accepted, we see from (1) and (2) that 

1 > “Xu > Bi y j 7i ij (i, j = 1» 2, . . . k, i =♦= j), 

where Bi y ; = antilog Ai f j . Hence 

< i lB iy j (i + /). 
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TCa — 1 2 7Tjj 

j * 1 


>1-2 1/*,,, (/ =1,2,..., /c) ... (4) 

j * i 

The inequalities (4) may be used to control the risks of error associated with any sequential pro¬ 
cedure. By choosing the Ai t j sufficiently large we can make the probabilities of arriving at the 
correct conclusion, when any one of the Hi is true, as large as we wish. These inequalities are, 
however, conservative, in the sense that the true probabilities may be considerably higher than 
the lower bound given by (4). As we shall see below, it is possible in certain problems to assert 
that certain of the 7t i; - are effectively zero, and so improve on the inequalities (4). 


3. Application to Discriminant Function Analysis 

Welch (1939) and, more recently, Smith (1947) have pointed out that Fisher’s linear function 
for discriminating between two multivariate normal populations, with different means but the 
same covariance matrix, is equivalent to the likelihood ratio between the two populations. Rao 
(1948) has used this fact in developing a method of discrimination between k > 2 populations. 
Apart from an adjustment for a priori knowledge, Rao’s method consists in assigning an individual 
to the i th group if (in our notation) 


(yi - y i) >o 


(y { - yi _ 0 > 0 
(yi - y i+ j) > 0 


yi 


>0 


(/= 1,2 1 ), 


y i <o 


yt -1 < o 


(i = k). 


In other words, the individual is assigned to the most likely population. The probabilities of 
correctly or incorrectly classifying an individual from any one group can be calculated by means 
of the probability integral of the multivariate normal distribution of the y *, which is tabulated for 
k = 2 and k = 3.* 

Suppose we have a group of individuals known to come from one or other of the k populations, 
and we wish to decide which one this is. The problem is clearly equivalent to that discussed in 
§2, and the rules of procedure for a sequential method are given by (1) and (2). The risks of 
incorrect classification are controlled by (4). This sequential procedure has in fact already been 
suggested by Rao (1947), for the case k = 2. 

As an illustration we may consider the case k — 3, with discriminant functions 


yi — log (Lx/La), 


y 2 = log (L 2 /L 3 ). 


In the standard method described by Rao we should assign an individual to populations 1, 2 or 3 
according to which region of Fig. 1 the point (y lf y 2 ) lay in. 

In the sequential method we continue to sample until the point (2 y u 2 y 2 ) first falls in one of 
the three shaded areas of Fig. 2. 

If, in particular, we choose Aij = A for /, j = 1, 2, 3, we find from (4) that the probabilities 
that an individual from each of the thrfce populations will be correctly classified are all greater 

* For k = 3 there are two discriminant functions, and the integral of the bivariate normal surface 
may be obtained from Tables VIII and IX of Pearson (1931). 
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Fig. 1. 


y 2 
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than 1 — 2/(antilog A). By choosing A sufficiently large, and the boundaries of the three regions 
sufficiently far from the origin, these probabilities may be brought as near to unity as we please. 

4. Test for Three Binomial Probabilities 

Denote by p the probability of occurrence of a certain event in each of a series of independent 
trials, and let H u H 2 and H 3 be the hypotheses that p takes the values p u p 2 and p 3 respectively, 
where p x < p 2 < p 3 . Suppose we wish to decide between H u H 2 and H 3 . If, after n trials, the 
event occurred m times, we have— 

2 y x = S log (Lx/L 3 ) = m log (pjp 3 ) - (n - m) log (1 - p x )l (1 - p 3 ) ) 
and . . (5) 

2 y 2 = 2 log (L 2 /L 3 ) = m log (j> 2 /p 3 ) — (n - m) log (1 - p 2 )/(\ - p 3 ) ) 

the summations, as usual, being over the n trials. 

By (1) and (2) we accept Hi at the n th stage if S {y x — y 2 ) > A 12 and 2 y x > A 13 ; we accept 
H 2 if Z(yi — y 2 ) < — A 2l and > A 23 ; and we accept H 3 if 2 y x < — A 31 and Sy 2 < 
A 32‘ 

These conditions may be seen to lead to a graphical procedure of plotting the point (n — m, m) 
on Barnard’s inspection diagram,* with boundaries as shown in Fig. 3. 



Fig. 3.—The inspection diagram 1 in a sequential test for three binorinal probabilities. 


The only part of the inspection diagram which is used is, of course, the positive quadrant, 
and in Fig. 3 the boundaries of the acceptance regions for H x and H 3 are shown as being linear 
within this quadrant. This is not necessarily so. 

The boundary S y 2 — — ^ 32 meets the m-axis where m 2 = — A 32 j\o% (/? 2 //? 3 )> and the line 

* Cf. Barnard (1946) or Stockman and Armitage (1946) (who use the term “lattice diagram”). Wald 
uses a similar diagram with n as abscissa and m as ordinate. 
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2^= — A 31 meets the ra-axis where nh = — A^JlogipJpa)/ In order that the two lines 
meet outside the positive quadrant we require 


A™ log (jh/pd .(6) 

A Z1 log (pz/pA 

The right-hand member of ( 6 ) is less than unity, and the inequality is satisfied if, for instance, 
A Z1 = A 32 . The boundary of the acceptance region for H z is then linear. Similarly, the boundary 
of the acceptance region for Hi is linear if 

A 13 l og (1 — Pi) Id ~ Pz) 

A 1 2 < log (1 — />i)/(l - p 2 j 

which is certainly true if A 13 — A i 2 , since p 2 <c. p 3 . 

It is interesting to note that the boundaries of Fig. 3 are almost the same as they would be if 
one were running two tests of the type considered by Wald in parallel—one for testing H 1 against 
H 2 and the other for testing H 2 against H 3 . Such a combination of tests was suggested by Armitage 
(1947) as a sequential analogue of the two-sided /-test, the problem having been first reduced to a 
two-sided test of a binomial probability. The near equivalence of the two procedures is clear 
from the similarity between Fig. 2 of the 1947 paper and Fig. 3 of the present paper. (A slight 
difference is that the type of path exemplified by the dotted line on the 1947 diagram might, with 
a small probability, lead to the acceptance of H i or H s by the present procedure, whereas (in the 
present notation) H 2 would immediately be accepted by the previous procedure. . This is clearly 
a very small discrepancy.) 

This equivalence enables us to improve on the limits of error given by (4). Suppose, for 
simplicity, that Ay = A , B t j = B. Then, from (4), 


7c n > 1 — 2 \B 
*^22 >1 — 2 / 2 ? 


( 7 ) 


^33 > 1 — 2/2? 


Now, if we were running two separate tests, with boundaries as in Fig. 3, and if in each test a 
were the probability that any hypothesis would be rejected when true, we should have (by Wald’s 
theory) 

B £b (1 - «)/«,. (7a) 


and (7) become, approximately, 

7 r n > 1 — 2 a/(l — a) = 2 = 1 — 2 a ' 


7 u 22 > 1 — 2 a/(l — a) — 1 — 2 a v. 


( 8 ) 


^33 > 1 — 2 a/(l — a) ^ 1 — 2 a 

if a is small. 

But, by direct considerations (the argument here follows that of the 1947 paper), we see that 
the probability, in the combined test, of accepting H 1 when true is effectively the same as that of 
accepting H x when true in'the separate test for H x against H 2 , namely 1 — a. Hence 


and similarly 


7 T n 1 — a . 
^33 — 1 — a . 


(9) 

( 10 ) 


The probability, in the combined test, of rejecting H 2 when true is very nearly equal to the proba¬ 
bility of rejecting H 2 when true in the test for H 1 against H 2 , together with the probability of 
rejecting # 2 when true in the test for H 2 against H 3 . Hence 

7 t 2 2 — 1 — 2 a . . . . . . . ( 11 ) 
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A comparison between (8) and (9)-(l 1) shows that two of the three inequalities were unnecessarily 
wide. The reason is that, in this particular problem, when H x is true we are very unlikely to 
accept i/ 3 , and vice versa , so that tt 13 and 7r 31 are very small. 

We may note, from the nature of Fig. 3, that the curve of the mean sample size against p will 
have a minimum when p is near p 2 , and two maxima, one when p is between p i and p 2 , and the 
other when p is between p 2 and p 3 . 

The relation between the present procedure and that given by Sobel and Wald (1949) becomes 
clear if we consider the following three mutually exclusive and exhaustive composite hypotheses: 

H x *: p < tfi H 2 *: H 3 *: p > a 2 , 

where a x and a 2 are constants satisfying p x < a x < p 2 < a 2 < p 3 . 

A sequential procedure for deciding between Hi*, H 2 * and H 3 * may be formulated by using 
exactly the same acceptance criteria as are given below equation (5), except that H x *, H 2 * and 
H 3 * replace H u H 2 and H 3 respectively. It follows from (9)—(11) that this decision procedure 
satisfies the following requirements: 

(a) If p <Pi, the probability is > 1 — a that Hi* will be accepted. 

( b ) If Pi < p < Pi, the probability is > 1 — a that either Hi* or H 2 * will be accepted, 
i.e. that H 3 * will be rejected. 

(c) If p = p 2 , the probability is 1 — 2a that H 2 * will be accepted. 

(d) If p 2 < p < p 3y the probability is > 1 — a that either H 2 * or H 3 * will be accepted, 
i.e. that H x * will be rejected. 

(e) If p > p 3 , the probability is > 1 — a that H 3 * will be accepted. 

If, in the method of Sobel and Wald, we consider the particular case in which (in their notation) 
0 2 = 0 3 , and if we formulate an analogous problem in which the mean of a normal distribution 
is replaced by a binomial probability, we arrive at the above procedure, p u p 2 and p 3 being 
equivalent to 0!,*0 2 and 0 3 respectively. 

5. A Two-sided Test for Comparative Trials 

Another problem which may be reduced to a sequential binomial test is the comparative 
trial for the difference between the parameters of two binomial distributions. Suppose we have 
two binomial distributions with parameters 0i and 0 2 . In Chapter 6 of his book Wald (1947) 
suggests that successive observations from the two populations should be paired. Denoting an 
observation by 0 or 1, we consider only the pairs (0, 1) or (1, 0). If 0! and 0 2 are the probabilities 
that an observation will be a 1, the probability that a member of the sequence of pairs ( 0 , 1 ) and 
(1, 0 ) will in fact be ( 0 ,. 1 ) is 

_ Ml ~ 9 *) 

p ~ 0,(1 - eo + 0.0 -00 . 1 ' 

If 0! = 0 2 , p = ■$■. If 0! > 0 2 , p < i; and if 0 2 > 0 X , p > i. 

The hypothesis H 2 that 0! = 0 2 is therefore equivalent to the hypothesis that p = i. As 
alternative hypotheses about the “difference” between 0x and 0 2 we could conveniently choose 
the hypothesis H x that p = p i < i, and the hypothesis H 3 that p = p 3 > i. A simple hypothesis 
about p is a composite hypothesis about the original observations, since, for a given p, 0i is a 
function of 0 2 given by (12). 

A test for the hypotheses H u H 2 and H 3 may now be constructed as in §4, the probabilities of 
correctly accepting each hypothesis being related to the positions of the boundaries by (7 a) and 
(9)-(ll). This test may in some practical situations be more useful than a test based on only 
two hypotheses. In a clinical trial to compare the effectiveness of two different treatments we 
may wish to design a test with the guarantees— 

(a) that if one treatment is an improvement on the other by more than a certain 
amount (p < p 1 or p > p 3 ), we shall have a specified high probability of detecting the 
difference, and 

(b) that if the treatments are equally effective, we shall have a specified high probability 
of saying so. 
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It may also be an advantage that the test has a comparatively small rhean sample size when 
p = when/? <Pu or when p > /? 3 . 

I wish to thank Mr. A. M. Walker for some helpful advice in the presentation of this paper, 
and'Mrs. M. G. Young for preparing the diagrams. 
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